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1 .  Introduction 

Automatic  feature  extraction  systems  are  of  considerable 

importance  in  intelligence  and  mapping.  They  allow 

images  from  a  variety  of  sensor  sources  to  be  processed 

by  the  application  of  analysis  algorithms.  High-resolution 

displays  are  essential  components  of  such  systems. 

This  project  is  concerned  with  the  design  of  a  display 

system  which  will  allow  such  algorithms  to  be  executed 

much  more  rapidly  than  at  present.  It  is  a  continuation 

of  work  commenced  at  Rome  Air  Development  Center  under 

the  1979  USAF-SCEEE  Summer  Faculty  Research  Program 

(1) 

(Contract  No.  F49620-79-C-0038) . 

Raster  graphic  displays,  either  monochrome  or 
color,  are  invariably  used  for  image  presentation,  and 
employ  semiconductor  memory  to  store  the  image  in  a 
quantized  digital  form.  The  data  rate  required  for 
raster  scan  TV  output  calls  for  fairly  elaborate  memory 
design  e.g.  to  permit  more  than  one  pixel  to  be  read 
per  memory  cycle.  Once  the  image  data  is  stored  in 
such  a  memory,  it  is  attractive  to  consider  the  attachment 
of  a  separate  processor  to  carry  out  typical  image 
processing  functions,  such  as  averaging,  enhancement , 
region-growing  and  collection  of  picture  statistics. 
Moreover,  many  of  these  functions  can  in  principle 
be  carried  out  in  a  parallel  manner.  This  report 


summarizes  the  design  of  a  suitable  pixel  processor 
(PXP)  and  its  capabilities. 

2.  Overview  of  the  Proposed  Work  Station 
The  work  station  is  a  computer  display  system 
dedicated  to  a  single-user  at  a  time,  and  has  storage, 
computation  and  display  functions.  In  a  typical 
automatic  feature  extraction  system  it  will  be  only 
one  of  a  number  of  stations  sharing  access  to 
archival  storage,  image  and  other  input/output 
peripherals  and  specialized  arithmetic  processors. 

The  work  station,  shown  in  Fig  1,  consists 
of  two  processors  sharing  a  large,  common  memory.  One 
processor  is  conventional,  and  a  Zilog  Z8000  (segmented) 
unit  is  suggested,  since  this  would  provide  an 
8  Megabyte  address  space.  This  size  would  be  adequate 
for  the  images  expected,  namely  four  1024  x  1024  pixel 
images.  Note  that  this  is  an  improvement  over  the 
Intel  8086  solution  proposed  earlier.  The  other 
processor  is  the  specialized  pixel  processor  PXP  with 
which  this  report  is  principally  concerned. 

PXP  is  designed  to  display  the  selected  image 
at  the  necessary  data  rate  for  high-resolution  TV, 
and  in  a  separate  mode  to  carry  out  image  processing 
algorithms  at  high  speed  but  without  simultaneous 
display.  It  is  subservient  to  the  Z8000  processor. 


2 


Interface 
to  Group 
Data  Base 
Processor 


microprog. 


pixel 

processor 


color  tables 


image  91 
1024  x  1024 

image  #2 
1024  x  1024 

image  93 
1024  x  1024 


image  9 4 
1024  x  1024 


simple 


vector 


up  to 

8  x  106 
bytes 


color 


monitor 


Pig  1  Overall  arrangement  of  Work  Station  incorporating  PXP 


While  the  pixel  memory  is  addressable  as  an 
integral  part  of  the  common  memory,  it  is  important 
to  note  that  its  organization  is  more  complex  than 
conventional  memory,  due  to  the  parallel  operation 
requirements  of  PXP  and  the  video  data  rate  of  the 
display. 

3.  The  Pixel  Processor  (PXP) 

PXP  is  a  single-instruction  path,  multiple-data  path 
(SIMD)  computer.  Since  each  pixel,  unless  on  an  edge 
has  8  adjacent  pixels,  it  has  a  vector  arithmetic  unit 
with  9  elements,  all  of  which  may  operate  simultaneously 
in  obeying  the  current  instruction.  Fig  2  shows  the 
register  layout,  including  the  accumulator  vector 
of  9  single-byte  element  registers,  along  with  the 
vector  condition  registers  (vz,vm,vc)  which  show  the 
results  of  the  latest  arithmetic  or  logical  instruction 
executed,  using  a  single  bit  position  corresponding 
to  each  element  of  the  vector.  Often  only  selected 
elements  must  be  activated  and  so  a  vector  control 
register  (vcr)  is  provided. 

As  well  as  vector- related  registers,  there 
are  the  typical  registers  of  a  conventional  processor 
e.g.  an  instruction  pointer  and  a  stack  pointer.  In 
view  of  the  large  memory  address  range  of  the  Z8000 
these  registers  have  associated  segment  registers. 


accumulator  vector  (9  8-bit  ALUs) 


working  registers 


memory  buffer 


vector  control  register  (9  bits)  vector  condition  registers  (9  bits) 


general-purpose  registers  (16  x  16  bit) 


gp  registers 
11  -  15 


correspond  to 


accumulator  vector 


nb  register  widths  are  not  shown  to  scale 


Together  with  a  memory  management  unit  (MMU )  the 
physical  address  of  any  byte  in  main  memory  is 
determined.  This  aspect  is  discussed  in  detail 
later  under  memory  addressing.  Finally  there  are 
a  set  of  general-purpose  registers,  of  which  a 
sub-group  correspond  to  the  vector  register  referred 
to  earlier.  In  consequence  individual  vector  elements 
can  be  addressed  directly. 

3.1  Addressing  Mechanisms  of  PXP 

It  is  important  to  distinguish  between  the  one -dimensional 
memory  access  required  by  PXP  for  instructions  and 
some  data,  as  distinct  from  the  1w  o-dimensional  access 
required  for  pixel  data  i.e.  a  pixel  is  referred  to 
by  address  (x,y).  Also  within  one  memory  cycle  time 
PXP  may  simultaneously  read  or  write  to  any  or  all 
of  9  adjacent  pixels. 

One-dimensional  access  can  be  provided 
most  readily  with  a  pair  of  MMUs,  which  are  standard 
Z8000  components,  and  so  no  further  attention  will 
be  paid  to  this  feature. 

In  vector  processing  PXP  may  require  access 
to  a  pixel  and  its  eight  immediate  neighbors.  This 
"worst-case"  situation  can  be  solved  by  assigning 
pixels  to  16  sub-memories  all  operating  in  parallel. 

In  effect  this  is  an  interleaving  method  with  a  row 
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stagger  of  4  pixels  and  was  discussed  in  the  1979 
Report.  The  1024  x  1024  pixels  are  stored  in  1 6 
memories  of  64k  words  each.  Note  that  each  word  is 
4  bytes  (32  bits)  long,  with  one  byte  for  each  of 
the  four  images. 

PXP  presents  a  pixel  address  (x,y)  to  the 
addressing  mechanism.  Depending  on  the  mode  selected 
at  the  time,  this  may  require  a  horizontal  vector 
of  9  pixels  (2  cases  -  left  or  right)  or  a  3  x  3 
set  of  pixels  to  be  accessed.  It  is  simple  to 
generate  the  x,y  address  set  and  then  each  pair 
must  be  converted  to  find  the  correct  memory  module  (m) 
and  word  address  within  the  module  (w).  Fig.  3  shows 
the  simplicity  of  the  logic  required,  which  takes 
account  of  the  row  stagger  referred  to  earlier.  Note 
that  the  512  x  512  scheme  given  in  the  1979  Report 
has  been  extended  for  the  larger  images.  The  set 
of  memory  modules  selected  will  always  be  distinct 
i.e.  no  two  pixels  will  be  simultaneous] y  required 
from  the  same  module . 

The  design  of  the  "crossbar  switch"  to 
allow  9  simultaneous  memory  addresses  to  be  specified 
to  9  of  the  16  memory  modules  and  the  tx  tsfer  of  the 
corresponding  9  data  words  is  a  formidable  task. 
Without  resorting  to  multiplexing,  this  would  require 


-  7  - 


a  crossbar  switch  with  9  x  28  =  252  lines  from  the 
memory  address  unit  and  16  x  25  =  400  lines  to  the 
memory  modules,  plus  a  few  extra  selection  and  oontrol 
lines.  There  would  be  a  total  of  9  x  16  =  144 
crosspoints,  each  controlling  the  transfer  of 
24  bits  of  data  of  which  8  must  be  bi-directional. 

The  design  of  the  crossbar  switch  is 
dominated  by  packaging  considerations.  Clearly  the 
total  number  of  lines  must  be  reduced  by  multiplexing, 
a  reasonable  compromise  between  time  and  lines  seems 
to  be  to  multiplex  both  addresses  and  data  over  the 
same  lines,  4  bits  at  a  time.  Each  memory  cycle  will 
require  4  transfers  for  address  specification  and 
2  transfers  (bi-directional)  for  data.  The  overall 
memory  cycle  time  would  be  increased, but  probably 
by  not  more  than  5®%>  over  the  basic  cycle  time. 

However  the  necessary  multiplexers  and  demultiplexers 
must  be  added  to  memory  selection  and  memory  units. 

The  crossbar  points  would  employ  tristate  devices. 

4.  Visits  and  Discussions 

In  March  1980  I  visited  De  Anza  Systems  Inc.,  makers 
of  the  latest  generation  of  raster-scan  display 
equipment  supplied  to  RADC  for  automatic  feature 
extraction.  I  discussed  the  principles  of  the  PXP 
processor  with  Mr  C.T.  Masters,  Vice  President  for 
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Engineering.  He  expressed  considerable  interest  in 
the  design  and  made  some  useful  suggestions  on  the 
influence  of  current  and  expected  memory  chip  sizes 
on  the  design.  I  also  visited  Advanced  Micro  Systems 
to  discuss  further  developments  of  the  Am2900  bit  slice 
microprocessor  which  is  appropriate  for  the  instruction 
rate  of  PXP.  I  visited  Dr  J.P.  Gray  of  the  Silicon 
Structures  Project  at  the  California  Institute  of 
Technology  and  discussed  the  possibility  of  using 
custom  LSI  devices  for  parts  of  PXP.  My  conclusions 
sere  that  the  crossbar  switch  would  present  major 
difficulties  due  to  the  number  of  external  connections 
involved,  and  it  might  be  necessary  to  integrate  both 
the  access  mechanism  and  the  memory  with  the  switch 
itself. 

5.  Simulation 

The  original  proposal  indicated  that  a  high-level 
description  of  PXP  would  be  prepared  in  the  SMITE 
language  and  emulated  on  the  RADC  QM1  computer.  In 
the  course  of  the  project  I  felt  that  the  memory 
addressing  and  crossbar  switch  designs  needed  more 
attention  at  this  stage,  and  on  the  short  timescale 
of  the  grant  I  have  been  unable  to  carry  out  any 
simulations.  This  would  still  be  a  very  useful 


exercise . 


6 .  Patent  Situation 


At  the  conclusion  of  my  works  under  the  Summer 

Faculty  Research  Program  at  RADC  I  submitted  an 

Abstract  of  New  Technology.  I  subsequently  asked 

Dr  Miller  of  S.C.E.E.E.  to  follow  up  the  patent 

situation.  I  understand  that  Dr  Miller  wrote  to 

(3) 

AFOSP  but  that  no  reply  has  yet  been  received. 

I  would  be  grateful  if  the  situation  could  be 
resolved  as  soon  as  possible. 

7 •  Conclusions  and  Recommendations 

The  design  of  the  PXP  processor  has  been  investigated 
more  thoroughly.  The  crossbar  switch  required  between 
the  vector  elements  and  the  pixel  memories  is  both 
complex  and  crucial  to  the  overall  performance  of 
the  processor.  Its  design  must  be  carried  out  in 
detail  before  an  accurate  estimate  of  the  overall 
performance  advantage  can  be  obtained. 

As  the  next  stage  I  recommend  that  the  possibility 
of  constructing  a  prototype  unit  be  considered.  The 
vector  arithmetic  facilities  of  PXP  have  been  shown 
to  be  very  suitable  for  image  processing  and  it 
should  allow  typical  image  processing  algorithms 
to  be  executed  much  faster  than  on  current  equipment. 
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